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1 Introduction 

Symmetric groups have been extensively studied in mathematics for over 3 centuries with a 
tradition going back to Lagrange. Many fundamental applications now exist in almost every 
branch of mathematics. In recent decades, statisticians and probabilists made attempts to 
understand their properties as finite probability spaces. Many of the important quantities 
on the symmetric groups can be extended to a metric structure. 

The ranking of n data points, for example, can be construed as a permutation in Sn- One 
can thus assess the difference between two sets of data by defining metrics on the symmetric 
group. If the two sets of data are components of n samples of a bivariate variable, then one 
would like the distance function to be invariant from one side, since it should be invariant 
under relabeling of the samples. In the latter case, the metric serves as a non-parametric 
correlation function. 

Another theoretical use of metrics on the symmetric groups is to understand the con- 
vergence rate of shuffling models. Here to confuse things further, the rate of convergence is 
measured in terms of another type of metric, which instead of on the group itself lives on 
the space of all probability measures on Sn- Typically one considers total variation distance 
or in the presence of an underlying metric on the probability space, transportation distance 
associated with the underlying metric. In the former case, one particular statistic, the Ham- 
ming distance, has proved quite useful in getting tight lower bound in many models. The 
latter is less well studied but conceivably is most naturally attacked via their underlying 
metrics as test statistics. 

As in principal component analysis, one would like to remove redundant information by 
identifying metrics that are statistically dependent on one another. It is thus surprising 
that many metrics that are dependent for finite n become asymptotically independent (in 
a sense made precise later) as n approaches infinity. This suggests that for large samples, 
these metrics give truly distinct measures of correlation. On the other extreme, one also 
find certain pairs of metrics that are asymptotically perfectly correlated. In this paper we 
examine some of the most popular metrics on the symmetric groups and their asymptotic 
statistical relations. 
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2 Metrics on finite groups 



In general, a metric on a set G of points is any bivariate positive-valued function p that 
satisfies the following three properties for all x,y,z & G: 

1. Nondegeneracy: p{x, = if and only ii x — 

2. Symmetry: p{x,y) = and 

3. Triangle inequality: z) < p{x, y) + p{y, z). 

If the underlying set happens to be a group, then additional structures can be imposed. 
We say that p is left-invariant or right-invariant if respectively, 

p{ax, ay) = p{x, y) 

or 

p(xa,ya) ^ p(x,y). 

It is said to be bi-invariant if it is both left and right invariant. 

A left invariant metric p (and similarly a right-invariant one) can be characterized by the 
univariate function f{x) — p{ld,x), so that 

p{x,y) = f{x~^y)- 

In this setting, the three properties of a metric can be rephrased as following: 

1. Nondegeneracy: f{x) =0 if and only if x = Id, 

2. Symmetry: f[x) = f{x~^) for all x G G, and 

3. IViangle inequality: f{x) < f{y) + fix'^y) or f{x) < f{xy) + f{y), for all x,y. 

It is a challenge to give statistical meaning of the triangle inequality in applications. 

For a bi-invariant metric, its characterizing univariate function is also a class function, 

f{axa~^) = f{x), 

which means that it projects to a function on the set of conjugacy classes of G. When 
G = Sn, a symmetric group, the conjugacy class to which a permutation belongs is uniquely 
determined by its cycle structure. Therefore the set of conjugacy classes is parametrized 
by Vn, the set of partitions of n. Whenever we talk about a class function / on Sn, it will 
simultaneously denote the function on Sn and the projected function on Vn, whenever there 
is no confusion. 
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3 Asymptotic independence 



Before we study statistical dependence on the symmetric group, it is necessary to have a 
clear definition of asymptotic independence. Given two sequences of real-valued random 
variables Xi,Yi, i — 1,2, . . ., we distinguish between two modes of asymptotic independence: 

Definition 1. Xi and Yi are said to be weakly asymptotically independent if for all bounded 
continuous functions /, g, 

lim E[f{X,)g{Yi}] = limE/(X,) \imEg{Yi}. 

i—>-oo i i 

They are said to be asymptotically independent in moment if all of their moments exist and 
for all j. A; e N, 

lim EX/r> = lim EXf hm EF^. 

i i i 

Notice that the definitions above imply that both sequences have weak limits and weak 
limits with all moments finite respectively. Moment independence clearly implies weak in- 
dependence. The other direction is also true, provided that X^ and Y^ both converge in 

moments: 

Proof. Consider the bounded function /jv,fe(a;) = \x\''1^x<n}- Then for fixed A^, 
]imE[fN,j{Xi)gN,k{Yi)] = limE[/jv,,(X,)] limE[y^,fe(y,)] 

Taking N ^ oo on both sides, interchanging limits on the left and using continuity of the 
function f{x, y) — xy on the right, we obtain 

limE[X/F^] = limE[X/'] limE[r/] 

i it 

□ 

Since we are dealing mostly with random variables that have moments of all orders 
when scaled appropriately, we can talk about the two modes of asymptotic independence 
interchangeably. Unlike modes of convergence, asymptotic independence is essentially a weak 
phenomenon, hence there are no stronger versions such as in or point-wise. 

Often one needs to know roughly how large an n suffices for the statistical implications of 
two functions to decouple. This requires a distance function on the space of all probability 
measures on M?. The one commonly used in Berry- Esseen type estimate is Kolmogorov 
distance, defined by 

d{fj,, v) — sup !//((— oo, x\) — oo, x]) I . 

X 

Other distances, such as total variation, is less appropriate due to the presence of atoms in 
one of the measures. 

The definition below is based on Kolmogorov's distance: 

Definition 2. Two sequences of random variables X^, Yi, i — 1,2, .. ., are said to be asymp- 
totically independent at the rate r(i) under the if 

sup,,y^j^\¥[Xi <x,Yi<y]- F[Xi < x]P[y, < y]\ = 0{r{i)). 
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4 Spearman's footrule, rho, and generalizations 



Viewing each permutation as a bijection from [n] to itself, one can easily come up with 
"natural" functions on S'„. The most obvious ones are of course the coordinate functions, 
which however is not a good candidate for building correlation kernels. The sum of all the 
components is of course trivially constant, so one has to look harder. Spearman |Sp| gave the 
following two analogues of the and norms, known as the footrule and rho respectively: 

n 

i=l 
n 

i=l 

Note that Spearman's rho differs from the correct norm by a square root. Since it is 
invertible, this extra facade has no bearing on independence, and it makes calculations much 
easier. One can also define the analogue of O' norms, pp, as above. The combinatorial central 
limit theorem implies that they all converge weakly to the standard normal when scaled and 
recentered, and that the error term can be controlled by a Berry-Esseen type estimate [Bolt j . 
Thus to understand their limits, it suffices to compute the means and variances. This can 
be effectively done by writing the sum out explicitly and split into cases where the indices 
are the same or different. One can use the same method to compute correlations of pp and 

Pq- 

Observe now that any finite linear combination of the p's is in the domain of normal 
convergence, also by combinatorial CLT, one obtains the full joint distribution of all the 
Pp's, since they are jointly Gaussian in the limit. 

5 Bi-invariant metrics and other metrics 

Recall that a metric p on a group G is called bi-invariant if p{agb, ahh) = p{g, h) for all 
a, b,g,h G G. Such metrics arise naturally in the context of subjective ranking. Suppose 
two wine connossieurs are to rank n bottles of wines, and we want to assess how correlated 
their tastes are. Then not only are the order in which the wines are presented irrelevant 
for the analysis, so are the actual rankings themselves. What matters is whether or not the 
two connossieurs assign the same relative value to each bottle. Thus if their rankings are 
presented as two permutations, and their valuation difference is measured by a metric, then 
the metric should be bi-invariant. 

More generally, one could consider data valued in an unordered set. A good example 
is given by the election of cabinet members. Suppose voters (or say the president and 
the congress) are to assign n office positions to n candidates already cleared for hiring, 
it would be useful to know how much the voters agree or differ on how to match. Since 
the cabinet positions are supposed to be non-hierarchical (unlike the president and the vice- 
president), one expects the measure of discordance to be invariant both under the reshuffiing 
of the candidates and of the positions sought. Of course in politics there are typically more 
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candidates than available positions. Private companies on the other hand tend to preserve 

existing work force by matching employees with jobs, such as during intern assignment. 

Furthermore, subjective assignments in the ranked setting often clash with objective 
scales that should correlate strongly with the ranking. In the wine tasting example, the n 
bottles might be made in different years. One could on the one hand tabulate each judge's 
rankings against the production years, and compare the resulting permutations using some 
left-invariant metric such as Spearman's rho, and on the other hand compare their rankings 
alone using some bi-invariant metric, such as the Hamming distance. It would be useful 
to know whether the two measures of disarray provide more information than one measure 
alone. The results below give affirmative answers, at least when n is sufficiently large. 

Lemma 5.1. Let \^ \- n be given for each n such that the number of parts in X, t := n(A) < 

n" for e < 1/6, then 

limE[/(p(<7))|A(a) = A"] = limE[/(p)(<7)] 

n n 

where f denotes any polynomial growth continuous function and p denotes any of the follow- 
ing 

• Spearman's pg for 1 < q < oo, 

• Kendall's tau 

• length of the longest increasing/decreasing subsequence 

and p stands for affine normalization of p to have mean and variance 1. 

Remark 1. We can easily generalize the result to other functions, but instead choose to focus 
on the well-known ones for simplicity. 

Proof. Write cr e -S^ in the following record cycle form: 

(7 = (an . . .oisj . . . {an ■ ■■atst) 

with the property that an > Uij for all i,j, and an < 021 < . . . < oa- The record map 
r : Sn ^ Sn is defined by 

i=l 

In words, we remove all the brackets in the record cycle representation of a and treat the 
resulting sequence as the second row of a permutation written in 2-line form. 

Let denote the set of permutations with cycle structure A. Given A h n, consider the 
following map 

: 5„ ^ 

a ^ . . . a(Ai))(a(Ai + 1) . . . a{X, + A2)) . . . {a{Xi + ... + At_i + 1) . . . a{n)). 
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This map pushes the uniform measure on S'„ onto the uniform measure on S^. So 
:= o r : Sn also pushes the uniform measure to uniform. Thus 

E[/(p,M)|A(a) = A] = E[fip,iM\am. (1) 

Furthermore F'^ changes at most coordinates of a, by the condition on A. Therefore 
changes at most n"^ + n{X{a)) coordinates. By the central limit theorem for the number of 
cycles, P[n(A((T)) > klogn] = 0{n^~^). 

In the case of Spearman's pq function, the standard deviation is of order ra''"'"^/^, and 
changing 0{n'^) coordinates alters its value by 0(n^+'^) << n^+^Z^, hence we have the follow- 
ing convergence under the uniform measure on Sn'- 

lim P[|(p,(M\a)) -p,(a))/ywp^| > e] = 0, 

n— >oo 

for all e > 0. In other words 

limp,(M\a))-p,((T) = 0, 

n 

in probability. 

Combining with ([T]), we have 

limE[/(p,(a))|A((T) = A] - E[/(p,(a))] = limE[/(p,(M\a))) - /(p,(a))](l - P[n(A) < Hogr 

n n 

+ maxpg(cr)P[r2(A) > fclogn] 
= 0, 

for > 2, by dominated convergence theorem, and the fact that 

maxpg(cr) = 0(r2^^^)/^varpq 

cr 

Kendall's tau r((j) := X]i<j -'-{o-(«)>'^(i)} variance of order m?, whereas the change of 
one coordinate value would affect 0{n) terms in the sum, each of which has contribution 
0(1). Therefore \T{M^{a)) — r(cr)| = 0(n^+'') = o^^Yaxr). Similarly, the length of the 
longest increasing sequence U (a) has variance of order n^^^ whereas changing one coordinate 
in cr would change U by at most 2. Thus the same argument for Spearman's rho functions 
apply to the latter two cases as well. □ 

Corollary 5.2. Any sequence of class functions fn on Sn with a weak limit is asymptotically 
independent of all the functions listed below, with the second column giving upper bound on 
rates of convergence: 

• The normalized Spearman's rho's, Pq, 1 < q < oo; r(n) = logn/n^/^. 

• The normalized Kendall's tau; r{n) = logn/n^^"^ . 

• The normalized Ulam's statistic (longest increasing subsequence); r{n) = logn/n^/^. 
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Proof. It is well-known that the number of cycles n{a) in a uniformly chosen permutation 
satisfies the central limit theorem with mean logn and variance logn. Thus for any A; > 1, 

F[n{a) > A; logn] = 0{Erf{-{k - l)^J\ogn)) = 0{n^'^). 

Next using the estimate of the previous lemma, we have for n{X) < klogn, and k > 2, 

P[p,(a) < a - -^\Ko) = A] < P[p,(a) < a] < P[p,(a) < a + ^^]. 

Using the fact that pq weakly converges to a standard normal, and in fact satisfies a Berry- 
Esseen's error estimate of order 

|P[p,(a) < a|A(a) = A] - P[p,(a) < a]| = — 

for some universal C. 

Next since /„ are class functions, they project to functions on the set Vn of partitions of 
n. Summing over all A with n(A) < fclogn, we obtain 

P[p,(a) < a,/„(a) < 6] = P[P.(^)l{/„{A)<6}|A(a) = A]P[5„V 

A:n(A)<fc log n 
\:n{\)<.k log n 

where |eA| < Clogri/n-'^/^, save a set of A's of probability at most 0(n^^/^). Finally, 

P[p,(a) < a, Ua) <h]- P[p,(a) < a]P[/„(a) < 6] = ^ ^^^t^"] + 0{n~^'^) + ©(n^-^') 

A:n(A)<A: log n 

< Oi\ogn/n^''^). 

The proof of the other two pairs (Kendall's tau and Ulam's statistic) follow the same argu- 
ment and is omitted. □ 



6 An application 

An interesting spin-off of the Spearman-type statistics is called the oscillation of permuta- 
tions, defined as 

i=l 

The notation p^^^ suggests the analogy with Sobolev norms in classical analysis. It was first 
shown in [Bail] that a central limit theorem can be proved about such statistics, by moment 
method. Later in |Chao] a tight Berry-Esseen error estimate is derived using Stein's method. 
More precisely, they showed that 

sup |P[pS^V) <A- Erf{x) \ = 0{n-^/^). 

X 
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Consider now an n-cycle r derived from a by the following recipe: 

T°\i,) = a{k) 

for some fixed starting index G [n]. Here = t o . . . o t for k times. Then it is easy to 
verify that 

We denote the map cr (-)• r by : S'„ — )■ SiC^. Then each one is measure-preserving, and 
using the Hoeffding combinatorial central limit theorem jBol t] with Bolthausen error term 
for pq, we obtain 

F[pl^\a) < x] =F[pq{rM) < x] 

= npg{(r)<x\X{a) = {n)] 

= Erf{x) + 0{^^) 

So we are off by a factor of logn in the error term, but the proof is significantly shorter than 
using Stein's method as in |Chao] . Note also that we get the same normalization affine map 
for p^^ and Pg. 

Using the same method, one could get central limit theorem with near-tight error bound 
for all the analogues of Sobolev norm, p^^\ provided one cna establish CLT for random 
diagonal of the form J2^=i (^i,a{i),...,(7''{i) i^i higher dimensional arrays. A less ambitious propo- 
sition is to show that the skip-2 (or in general skip-/c) Sobolev p^q^ norms are asymptotically 
Gaussian: 



1=1 



where again the summation indices are taken modulo n. Using the same conditioning argu- 
ment as before, one needs CLT for the following variant of the Spearman's p metrics: 



i=l 



Proof. Observe that the set {a^ : a G Sn} consists of permutations r with Q;2fc(r) = mod 
2, for all k < n, where aj(cr) denotes the number of j-cycles in a. Define a bijection r on 
Sn, r : cr I— i- cr', as follows: given cr G Sn, and a j-cycle 7 = (0102 . . . ctj) in a arranged so 
that ai = maxoj (i.e. in record form), we will let 7' be a j-cycle in the image a': 

1. if j = 2k, and say 7 = (0102 • • • a2k), let 7' = (aias . . . 02^-10204 • • • a2k)- 

2. if j = 2k + 1, let 7' = (aiOa . . . 02^+102 . . . a2k)- 
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The resulting a' has the same cycle structure as a and the map described above is a conjugacy 
class preserving bijection. Note that it is necessary to standardize each cycle to the record 
form (or by some other convention) in order for r to be bijcctive. 

Furthermore, when we post-compose r with the appropriate bracket inserting operation 
/9, we recover the familiar map a i-> . Here /9(cr) breaks each even cycle in a arranged with 
largest element first into two equal smaller ones, at a position determined by the action of the 
map o" I— )■ 0"^ on that cycle; it leaves the odd cycles untouched. For example, if 7 = (612345), 
r(7) = (624135) and /3 o t{(j) = (624)(135) = (624)(513). 

Now since with high probability, a sampled from the uniform measure has fewer than 
k log n cycles, for any k > 3, P changes the image of r by at most k log n coordinates with 
high probability, so the same perturbation argument presented in the previous sections show 
that for all e > 0, 

lim ¥[\pg{T{a)) - pg{a^)\ > eJvar(p,)] = 0, 

n— >-oo V 

and that almost surely. 



lim \pg{T{a)) - Pg{a^)]/JYar{pg) = 0. 

n— >-oo V 

Since r is a (measure-preserving) bijection, the random variable Pg(o"^) is also asymptot- 
ically Gaussian, whose cumulative distribution function differs from the normal one by 
O (log n/^/n) as before, as a consequence of the Bolthausen error bound. □ 

Finally we observe that P2 can be written in the following form: 

n 
i=l 

n n n 

- 2 5^(a(z) ^'^(^) + 2 Yl '^'(^) + 

i=l 1=1 1=1 

n n 

= -8 5]i<7(i) + 2^i<72(i) + c; 

i=l i=l 

where Cn,C^ are some constants. Thus to establish asymptotic normality of pf\(T), it 
suffices to show asymptotic joint normality of p2(c) and p2,2(c). This however doesn't seem 
to follow from elementary considerations. 

7 Spearman's uniform metric and other metrics 

The natural C°° generalization of Spearman's footrule and rho metric on Sn is given by the 
following formula 

Poo(cr) = niax|i - a{i)\ 

i<n 
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In this note we will study the limiting distribution oi H := n — p^ under the uniform measure 
on Sn as well as its independence relation with other common metrics on S'„. 

Throughout the article, lower case p will denote a single point, whereas P will denote a 
collection of points. 

Consider Hammersley's device which samples n iid uniform points in the unit square 
[0, 1]^. For each instance P = {pi, . . . of this point process, define X{pi) = Xp{pi) := 
the number of points to the left of Pi, including pi itself, and Y{pi) = Yp{pi) := the number of 
points below Pi also self-included. We will also define X'{pi) (resp. Y'{pi)) to be the number 
of points strictly to the right of (resp. above) Pi. These are almost surely well-defined and 
distinct for different j9j's. We can associate to P a permutation a, defined by ap{i) = j if 
there is some pk E P, with X{pk) = i and Y{j>k) = j. It is easy to see by symmetry that the 
induced measure on Sn is uniform. 

The Spearman's uniform metric poo can be expressed in terms of Hammersley's coupling 

as 

Poo(o-p) = max/(pfc). 

k<n 

where f{p) = fp{p) := \X{p) — Y Similarly 

H{ap) = min h{pk) 

k<n 

where h{p) := {X{p)+Y'{p))A{X'{p)+Y{p)). Thus it suffices to study the latter distribution. 
We will abuse the following notation H{P) = H{ap), and similarly for poo- 

Instead of a fixed n ensemble, it is much easier to work with a Poisson point process A 
on the unit square with homogeneous rate u k, n. We have the following De-Poissonization 
theorem (see |Johanj Lemma 2.5). 

Theorem 7.1. Let An he a monotone (increasing or decreasing) sequence with values in 
[0,1]. Define 

oo n 
n=0 

Then 

ipA{N - ^/NhgN) --^<An< Va{N + ^N\ogN). 

The proof of the theorem relies on approximating the Poisson distribution by the normal 
distribution near its mean. One can first prove it for increasing, and then consider 
Bn = I — An to establish the decreasing case. Since (Pa{X — A/iVTogiV) approximates the 
average of Am for m near — ^/T^^^\ogN, monotonicity yields the first inequality. The second 
inequality follows a similar argument. 

In many situations, we cannot get exact monotonicity. The following corollary is thus 
useful. 
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Corollary 7.1. If An G [0, 1] is a sequence that satisfies A„(l + ^) > An+i, for some c, 
and S > 1/2, then 

where /i — N — y/N log N and u — N + -\/]Vlog7V as before. 

Proof Let 7 = iV - ^iV(logiV)2 and define S„ = AIILt+iII + ^)"^ for n > 7 and 
Bn — An otherwise. Then Bn > Bn+i for all n > 7. Observe that if n < 7, 

e"'^— = exp(— + nlog/x — n(logn — 1)) 

77/. 

< exp(n[l +log- - -]) 

< exp(C»(n(^ - 1)2) A - 1)) 
<C»(exp(-(log7V)2)) 

because x — log(l + x) < 0(a;2 A x), for a; > 0. Therefore the contribution of Y11i=q ^n^^^ 
is C(exp(— (log iV)2)) and similarly for ^ replaced by v. 

Note that the sequence S„ e [0, 1]. Hence by the previous theorem we have 

ipB{N - a/TV log TV) -^<B^< ipB{N + ,/N\ogN) + 

Observe that 

< eMcN^-^ log N) < 1 + £1^^. 
Thus < 1 + £M|[ and since ^^(/x) e [0, 1], 

Clog AT 



Similarly, we obtain An-Bn< This gives the first inequality The other inequality 

can be analyzed similarly, with a bigger constant C. □ 

Let p* = argmaxpgp /(p) = arg minpgp /i(p), which is almost surely well-defined. Define 
g{p) = d{p, {x = y}), where {x = y} is the diagonal segment of the unit square, and d stands 
for Euclidean distance. Also let p — argmaxpgp g{p). Intuitively, p is the point in P whose 
x and y coordinates are furthest apart. Therefore by law of large number consideration, one 
would expect / to be maximized at p. 
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Lemma 7.2. 

Pn[p*=P] = l + 0„(l). 

Proof. Conditional on a typical position of p, wc need to show that with high probability 
H{pi) > H{p) for all Pi G P\ {p}. By a simple calculation, we see that g{p) < n^^'^ log n 
with high probability. Conditional on p, the remaining points are distributed iid uniformly 
on the region {^f < g{p)}- Furthermore, the trapezoidal region T = {g ^ [giP) —n~^^^, g{P)]} 
has no points with high probability because its area is of order n~^/^~^/^ = o(n~^). Thus 
conditional on that event, the remaining points are iid uniformly distributed on {g < g{p) — 
j^-5/9j_ J\f(T) denote the number of points in T. For each pi ^ p in P, 

F[H{pi) < H{p)\p,M{T) - 0,p,] = ¥[M{U) < MiV)] 

< exp(-0(n^/^)) 

where U is the union of the darkly shaded regions, and V is the lightly shaded region in 
Figure ??. 

The last estimate can be easily obtained using multinomial distribution, or Poisson heuris- 
tics. Thus p-almost surely, we have 

P[mini/(pi) < H{p)\p\ = ¥[minH{pi) < H{p)\p,N{T) = 0] + o(l) 

Pii^P Vi+V 

< J2 ^HiPi) < H{p)\p,Af{T) = 0] + o(l) 
Pi¥=P 

=0(1). 

Taking expectation with respect to p concludes the proof. □ 

Lemma 7.3. For fixed m, Let An := f'n[H{p) < m]. Then An > An+i(l + for a fixed 
constant c. 

Remark 2. Actually we are able to show Pn[Poo(c") < rn] is monotone decreasing in n (see be- 
low) , hence we can apply the De Poissonization theorem above directly. However the lemma 
also yields asymptotics for H(p), where p — argmaxpy(p) — x{p), for which monotonicity 
doesn't necessarily hold. 

Proof. Conditional on g{pn) — ck, Pn is equally likely to be at any point on the union of the 
line segments {(? = a}, for any n. Furthermore, conditional on the exact position of p, we 
have by geometric domination 

^n[H{p) < m\p\ > Fn+i[H{p) < m\p]. 

Therefore, 

Fn[H{p) < m\g{p) = a] > Fn+i[H{p) < m\g{p) = a]. 

Finally observe that ¥n[g{p) < a] = (1 — (^ — o;)^)"', which is the probability that there are 
no points in the top left corner of height ^ — a, hence 

¥n[g{p) e da] = 2n(l - a){l - (1 - a)')"-^ 
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This yields P„+i[g'(p) e da\/¥n[g{p) e da] < 1 + ^. Integrating with respect to a gives the 
result. 

□ 

Lemma 7.4. Under P^, the law of the Poisson point process with uniform rate v on [0, 1]^, 
-^H{p) converges weakly to the square root of an exponential distribution with mean 1/2. 

Proof. We first study the distribution of g{p)- Fold the unit square in half along the diagonal 
{x = y}, and rotate it so that the hypotenus is contained in the x-axis and the resulting 
triangle A sits on the upper half plane. This would also overlap the points in the two half- 
triangles on the original square into the same triangle A. Then g{p) is given by the height 
of the highest point in the new Poisson point process on A, with uniform rate /i — 2u. The 
probabillity that g{p) < a is the same as the probability that there are no points in the top 
similar triangle A^_^ of height ^ — a, which is easily calculated to be 

P^'^[A^(A^_J = 0] = exp(-2^(^ - af). 

By a change of variable /3 = ^ — a, we can compute the density of y^/^- 

p2-(0;/3 e db) = -i-e-^'" 
db 

^Abe-^'\ 

Given gi(p) = a, p is equally likely to be anywhere on the line {g = a}, hence the 
conditional distribution of H{p) can be calculated by a simple averaging. To avoid such 
calculation, one could observe that H{p) is stochastically squeezed between two Poisson 
random variables and f/g, with rate z/\/2a(l — \/2a) = v{\ — \/2f3)\/2f3 and z/[(l — 
\/2/3)"\/2/3 + |(v^/3)^] respectively. The lower bound Lp is obtained by combining the two 
rectangles in the region above and to the left of p into one of size (1 - ^/3)y2/3, and the 
upper bound is obtained by looking at the worst case when p is at an edge of the unit 
square. 

Since (1 — ^/2aY is of order with high probability, the Kolmogorov distance 

between these two Poisson variables are very small compared to their means. 

we can now estimate the moments of -^H{p) by -^L^_^ and -^U^_^, with f3 dis- 
tributed as above: 

= u-'^/'lM'^ - V2f3)V2V^f3]' + z/-^/2E[i?fc_i(/3)|/3] 
- [(1 - V2x/i;/3]^ + u-^'-'^/^E[Rk-i{ 



where Rk-i{P) is a linear combinations of factorial moments of of degree at most k — 1, 
hence 
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The key point is that ^ is concentrated near v hence this term is essentially negligible: 
m^Lpf] = E[[(l - ^/2^)^/2v^'^^f] + CE/j'^"^ 



V2v 
2 



[(1 - ^/2h/v^l^)^/2hfAhe-^'''dh + o(l) 







POO 



(y(2)6)V(^'')'d(\/26)2 + o(l). 



This shows the weak limit of -^L/s has density Abe "^^^ dbl ^h>Qj . This is the density of the 
square root of an exponential random variable X with mean 1/2. Similarly, -^Uj^ weakly 
converges to \/X as well. Thus H(p)'' converges to the same thing as 

yy yy yy 

Note that if we define p = argmaxp.y(p)>2;(p) {x = y}), then -^H{p) converges to the 
square root of an exponential random variable with mean 1. This corresponds to the following 
one-sided Spearman's uniform metric on S^- 



Poo{(y) = niax(i - (t{%)). 



□ 



Corollary 7.5. -^[n — Poo{cr)] converges weakly to the square root of an exponential random 
variable with mean 1/2. 

Note that we are not able to show convergence in moments, because the dePoissonization 
procedure requires a bounded test function. 

Next we examine relation of poo with pq for finite q as well as the lengths of the longest 
increasing and decreasing subsequences, denoted I{a) and D{a) respectively. First we need 
a lemma 

Lemma 7.6. Fix mi, m2, . . . , moo > 0, consider the events Aj — {Pjic) < irij}, 1 < j < oo. 
Let B CNU {oo} be any subset. Then Fn[r\j^BA.j] is a weakly decreasing sequence in n. 

Proof. For any o" e (S^, we construct a random element r e Sn+i by the Chinese restaurant 
process, i.e., either letting n + 1 to be a fixed point of r or inserting n + 1 into an existing 
cycle randomly. In other words, with probability we let r(n + l) = n+1 and r(j) = a{j) 
for j G [n] and with the remaining probability we choose an element k G [n] uniformly at 
random, and modify a by letting r(/c) = T{n + 1) = a{k) and r(j) = cr(j) for all other 

j e [n] \ {k}. For each pq, this either introduces a new term |n + 1 — (n + 1)|*, or replaces 
the term \k — (T{k)\'^ by two new terms |/c — (n + 1)|^ + |n + 1 — (T{k)\^ >\k — o"(/c)|^. Hence 
Pqij) ^ Pqi.^)- Thus Pj(t) < ruj for all j & B implies Pg((7) < rUq for all j & B, and the 
assertion follows. □ 
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Remark 3. The Chinese restaurant couphng of Sn with -S'^+i does not give monotonicity of 
the distribution of I (a) and D{(t), as the process of inserting n + 1 into a cycle can destroy 
an increasing or decreasing subsequence. 

Theorem 7.2. Fix k E N. Then \/ly/n[n — Poo(c)] is asymptotically weakly independent 
from the following random vector {pi, . . . , pk), where pg — -j=={pq — Ep^). 

Proof. For simphcity, we only consider the case of a single g; the more general vector version 
follows the same reasoning. 

Using the monotonicity lemma above, and the general de-Poissonization theorem of Jo- 
hansson, it suffices to prove asymptotic independence under the Poissonized Hammersley's 
point process with rate z/, as v goes to infinity. To be precise, for an instance P of the 
Poisson point process on [0, 1]^, define 

p,{p)^Y.\^^p)-'^^p)\'- 

and similarly, Poo{P) = maxpgp \X{p) — Y{p)\ as before. 

It is known that Spearman's footrulc has mean of order n"^ and variance n^, and Spear- 
man's rho has mean of order and variance n^. In general it's not too hard to show Pg has 
mean of order n*^"*"^ and variance of order n^'^"'"^. 

Then by Poissonization we know E^pg = + 0{u'^), and var'^p^ = VgU^'i+'^ + 0{u'^'^), 

where pg, Vg > 0, and that pg converges to a standard normal variable A^^ upon normalization. 
We need to show that conditional on p^ a typical value, Pg still converges to a standard 

normal variable that's close to A^^^. 

So conditional on -^[u — poo] > we have g{p) > ^(1 — /3) + o{iy~^/'^) and p = p* with 
high probability. Thus A/'(A^) = where = {p '■ g{p) ^ diP)} we have a Poisson 
point process Ui on [0, 1]^ with uniform rate v on the region i?^ :— {g < gip)}, and rate 
on its complement A^. 

Now consider a new point process U2 on [0, 1]^ coupled to Ui so that U2 equals Ui on 
and in the complement of R/^, f/2 is given by an independent Poisson point process with rate u. 
Thus U2 is a point process with uniform rate on the entire square. With high probability, we 
have P < u'^^'^logu, and also with high probabihty under C/2, A/'(A^) < (logz/)^. Therefore 

Therefore we have shown that for all there is a coupling under which 

I' YQX^Pg 

in probabihty. In particular this imphes ^^Pi\P'=^>^-^'' ' ^-i^t converges weakly to a standard 
Gaussian for all ^. Hence pg is asymptotically independent of poo- CH 

Next we consider the correlation between poo and the first k rows and columns of the 
RSK algorithm output. Let /i((7), . . . , Ihio) be the lengths of the first k rows of the Young 
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diagram obtained from the RSK algorithm apphed to a. Similarly let . . . , Dk{a) be 

the lengths of the first k columns. Curtis Greene has shown that ^^L^ h{(^) = max | UjL]^ S"/ 1 
where are increasing subsequences of a. Similarly Di{a) = max | U^^-^ Sf\ where Sf are 
decreasing subsequences of a. Thus for example, if A; = 1, /i(cr) is the length of the longest 
increasing subsequence, which was known long before Greene's result. 

Theorem 7.3. poo is asymptotically independent of the random vector (Ji, . . . , J^, Di, . . . , Dk) 
in the following sense. Let poo = :^(^ ~ Poo) the normalized version of p^o, and Ij = 
^^(/j — \/2n), Dj = —ljg(Dj — \/2n) the normalized version of Ij and Dj respectively. Then 
for fixed mf, . . . ,m^,mf , . . . ,mf ,moo, 

Jim P„[poo < rup, ij > m], Dj > mf J G [A;]] - P„[poo < mp]Fr,[ij > mj, Dj > mfj G [k]] = 0. 
Proof. Let Qn = Pn[Poo > ""^p; > f^], Dj > ^f,j ^ [k]]- We first show that 

Q„ < (1 + ^l^)Q„+i. (2) 

n 

We consider the following coupling between a uniformly random element a & Sn and 
T G Sn+i- Let a be given by an instance P of the Hammersley's point process, then r 
is given by adding another point uniformly in [0, 1]^ and independently from the points in 
P. Using Greene's interpretation, it is evident that /j(t) > Ij{cr) and Djir) > Dj{a) for all 
j. Furthermore given a, Poo{t) > Poo(c") only if the new point Pn+i lands in the union of two 
triangular regions ^g{p)- By a direct computation, we see that ^ — g{P) is dominated by the 
square root of a geometric random variable with mean of order (see also Lemma [7.4p . 

Hence with high probability (1 — 0(n~'^)), vol(Aj,(-p)) < £i2£li_ Therefore 

Pn[Poo(r) > Poo (a)] < + 0{n~^). 

These two considerations imply ([2]). Thus combined with the De-Poissonization Corol- 
lary 17.11 it suffices to show 

lim P"[poo < mp, < m\, Dj < mf , j G [k]] - F^lp^o < mp]F''[i, < mj, Dj < mf,j G [k]] = 

where Ij is defined to be the normalized maximum size of the union of j increasing sub- 
sequences in the Hammersley's square, with the normalization scale n replaced by v. The 
other variables are defined similarly. The idea is similar to the proof of Theorem I7.2t here 
we provide a bit more detail. 

We first condition on G{p) := 1 — ^/2g{p). With high probability G{p) < z/~^/^logi/. 
Conditional on G{p), II{p) is a mixture of Poisson random variables with rates bounded in 
the interval [i>G{p) — clogu, i'G{p) + clogi/], for some constant c. Since Poisson distribution 
of rate ^/p has mean y/u and variance of order z/^/^, and in fact behaves like a Gaussian near 
its mean, we have 

r|G(p)e|/;-l2|il,^ + !2S!:]|^ = ^] = i + „(i). 



16 



Therefore if we let Q = Q{mj,m^) := {Ij > rrij, Dj > m^,j G [A;]}, then 

F'in n {Gip) <(3- ^}) + o(i) < P'^(^] n {H{p) < /?}) 

< F-'in n {G{p) <(3 + i^}) + 0(1). 



Thus if one can show Q and G{p) are asymptotically independent, then the two sides of 
the inequalities above would be asymptotically equal (because G{p) has continuous distribu- 
tion function), which would also imply the asymptotic independence of H{p) and Q. Since 
we know H{p) = v — Poo{P) with high probability, this would further imply the asymp- 
totic independence of Vt and Poo{P)- So it remains to show Q and G{p) are asymptotically 
independent. 

Conditioning on G{p) < (3 is equivalent to conditioning on Af{A^ ) = 0, i.e., there 

are no points in the top left and bottom right corner isosceles right triangles of leg length 
The point process on the complement of A = has a uniform Poisson rate of 

u. Call this point process Ui. Now if we construct U2 on [0, 1]^ based on Ui by adding an 
independent point process on A, then U2 is simply a PPP of uniform rate u on the entire 
square. Since adding points would only increase the values of Ij, Dj, for any j, we get the 
following bound: 

r''[Q\G{p) <(3]< F'^iQ]. 

Next suppose an increasing subsequence S contains a point in A, without loss of generality, 
say S G {x < y} the upper left component of A. Then the remaining portion of S is 
contained in the region := U := {x < ^} U > 1 - ^}. But since P''[7V(F^) > 

^] = 0(1) and F-^iAriVy) > 1^] = 0(1). Hence 

P^[|5|>^|SnA^0] = o(l). (3) 

Now consider the events Bi := Uj<k{IjiUi) 7^ /j(f/2)} and B2 := Uj<k{Dj{U2) - Dj{Ui) > 
logz/}. If we can show 

P^(5iU52)=o(l), (4) 
{h{U,) - /i(f/2), . . . , 4(f/i) - hiU2),D,{U,) - D,{U2), . . . , Dk{U^) - ^(f/s)) 



then the vector 
1 



/I/6 



converges to in probability, which would imply the conditional distribution C{Ij,Dj,j G 
[A;]|(j(p) < /3) converges weakly to the unconditional C{Ij,Dj,j G [k]), from which asymp- 
totic independence of fl and G{p) is immediate. 

Bi implies there exist some p G A, and an increasing subsequence 5* containing p such 
that 15*1 > miujxfe \Ij{U2)\- By ([2]), this event has probability o(l). On the other hand, B2 
implies the number of points in U2\Ui, i.e., the extra points in A, is more than log u. But this 
also has vanishing probability due to the fact that vol (A) = o(^2£il) with high probability, 
and the Poisson mean of U2 on A is proportional to z/vol(A). 

□ 
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Theorem 7.4. Under uniform measures on Sn, Spearman's uniform metric Poo(c) is asymp- 
totically independent from any sequence of class functions fn on Sn that has a weak limit. 

Remark 4. We use the word metric loosely here to mean a univariate function on Sn given 
by the distance between its argument and the identity element. 

Proof. We will show that for almost all A h n, conditional on the cycle type of a being A, 
Poo(c) has the same law as the unconditional law with high probability. 

First note that the same strategy used to prove independence of Pqic) and /(a) would 
not work here since changing even one bracket of a can change the value of Pg a lot. But the 
crucial observation here is that the bad bracketing positions occur with vanishing probability. 
In order to get a good estimate of this probability, we first need to symmetrize the record 
map as follows. 

Recall the record map r does the following to a permutation a & Sn 

1. it arranges the elements in its cycles cychcally so that the biggest element appears in 
the first position, and then arranges the cycles by the increasing order of their first 
elements. The end result is still a. 

2. it removes all the brackets and view the resulting n-sequence as the second row in the 
2-line notation of a permutation. 

It is well-known that r : Sn ^ Sn is a, bijection. Now for r e S^ define 

rr{(T) — T~^r{TaT~^)T. 

In words, this means we perform the record map on cr, with the natural ordering on [n] 
replaced by the new ordering defined by i <r j if r(i) < t(j). 

Finally recall that S^ denotes the set of permutations with cycle structure A, and the 
map ipx : Sn ^ S^ takes each a to (o'(l), . . . , (t(Ai))((t(Ai + 1), . . . , f7(Ai + A2)) . . . ((T(n — A/ + 
1), . . . ,a{n)). ipx takes the second row of a under 2- line notation and inserts brackets into 
it to arrive at a new permutation in the cycle notation. 

For each r, r' G Sn, and a uniform in Sn, v^a oT:J-o</?(n) °'"t(c) is uniform on Sn- Therefore 
if we take r uniformly random and independent from a, the result is still uniform on Sn- 

Next observe that o rT-{cr) picks uniformly a symbol ij from each cycle Cj of a and 
changes the value cr(ij); to us it's not important what the modified values are. For instance, 
ii a — (124) (536), then with probability (1/3)^, r^((T) picks 2 from the first cycle and 6 from 
the second cycle, and changes the value of o"(2) to 5 and cr(6) to 4. 

Let i*{<j) — argmax|i — cr(i)| taken to be the smallest such if there are more than one 
maximizers. 

We will first show that these modifications do not affect Poo(o") with high probability, i.e., 

P[Poo((7) 7^ pooiV(n) O rr{a))] = o(l). 

First we need a lemma 

Lemma 7.7. Under the uniform measure, i*(cr) resides in a cycle of length at least (logn)^ 
with high probability. 
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Proof. Let ai{a) be the number of i-cycles in a. By a result of Arratia, Barbour, and Tavare, 
we have the following approximate coupling result 

. . . , a(iog„)4) - ®i<i<(iogn)4^i/j||TV = 0(1) 

where yUi/j are independent Poisson distribution of rate 

We will call a cycle small if its length is less than (logn)^. Using Chebyshev inequality, we 
can show that the union of all small cycles has size less than (logn)^ with high probability: 

(logn)-*| (logn)-* 

P[ J2 > (logn)^] <E[exp(t ^ i/i./i)]/ exp{t{lognf) + o{l) 
1=1 1=1 

(log n)'* 

= II eT(^"-^)/exp(i(logn)9) + o(l) 
1=1 

(log")" 

= exp( -(e»*-i)-t(logn)9) + o(l) 

i=l 

SO if we choose t = (logn)~^, we can approximate e** — 1 by it for i < (logn)"^ and the last 
quantity is bounded by exp(— (logn)^) + o(l) = o(l). 

Now fix a typical cycle type A as above. Let Cq = C'o(A) be the union of all the small 
parts of A. Let a be uniformly chosen from S^. We can estimate the distribution of poo on 
a restricted to Cq as follows: 

¥s>.[pMco) <n- > (1 - 2n-V9)ICo| = 1 + o(l) 

for I Co I < (logn)^. This uses the fact that if the symbols of a\co are all contained in 
[n^/^, n — n^/^], then Poo(c|co) < ^ ~ 'i^^^^- 

If we can show that Poo(c|[n]\Co) > n — rtJl^ with high probability, then must reside 
in one of the big cycles. In fact we can replace [n] \ Co by just the longest cycle C\. It is 
well-known that the normalized longest cycle length converges to the Dickman distribution 
which is strictly positive on (0, 1), hence |Ci| > n^/^ with high probability. 

We will consider consecutive pairs of adjacent symbols in (t|ci. One way of choosing a uni- 
formly random element from is by choosing a uniformly random string s = (si, S2, ■ ■ ■ , Sn) 
of length n consisting of nonrepeating elements in [n], and then imposing the appropriate 
brakccts on s to have the desired cycle structure A. Alternatively one can lay down the 
brackets first and then fill in the entries Si, S2, • • • sequentially by sampling uniformly without 
replacement from [n]. We adopt the second point of view and let the left-most bracket corre- 
sponds to the longest cycle. Consider the pairs of elements (s2fe-i, S2fe) for /c = 1, . . . , n^/^. For 
each k, the probability that |s2fe-i — S2k\ > n — nJ^^ given the previous s/s is bounded below 
by — — — — = fi(n~^/^), which comes from the worst case when all the previous (s2j-i, S2j) 
pairs land in [n''/^] U [n — n'^^^,n]. Therefore the probability that |s2fe-i ~ S2k\ < n — rJ^^ 
for all k < np/^ is bounded above by (1 — n~*^/^)"^ ^ = o(l). This concludes the proof of the 
lemma. □ 
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Using the lemma, we see that for asymptotically almost all cr, Poo{f{n) ° f^ri^')) = Poo(c"), 
because if i*{cr) lies in a cycle of length k > (logn)^, then the probabihty that i*{cr) is chosen 
in the bracket removing process of would be 1/k. Next we see that r,-/ essentially permutes 
the sequence rT-(cr) cyclically in a uniformly random way. Therefore for a cycle structure A 
of at most say (logn)^ cycles, which is with high probability, the chance that ipx o rr'icr) 
modifies a particular value cr{i*) is bounded above by (log?T,)^/r;,. Thus we have shown the 
following: 

for almost all A under the uniform measure on Sn, and almost all a ^ Sn, 

IP[Poo(v5a o r^, o o Trier)) ^ pooicr)] = o(l). 

Since (p\ o r^/ o o r^- : Sn ^ pushes forward the uniform measure to the uniform 
measure, we have 

IP5^[Poo(cr) <a]= Fs„[pooi^\ o o o r^((T)) < a] 
= Fs„[pooi(r)<a]+o{l). 

Finally by B aye's rule, 

F[pM < «, < /3] = nnpooicr) < a, f{a) < /3|A(a)]] 

= E[P[poo(^) < a|A]l|^(,)<^} + 0(1)] + o(l) 
= P[pooM<«]P[/(a)</3]+o(l). 

□ 
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